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SDP-based Joint Sensor and Controller Design for 
Information-regularized Optimal LQG Control 

Takashi Tanaka^ Henrik Sandberg^ 


Abstract — We consider a joint sensor and controller design 
problem for linear Gaussian stochastic systems in which a 
weighted snm of quadratic control cost and the amount of 
information acquired by the sensor is minimized. This problem 
formulation is motivated by situations where a control law must 
be designed in the presence of sensing, communication, and 
privacy constraints. We show that the optimal joint sensor- 
controller design is relatively easy when the sensing policy is 
restricted to be linear. Namely, an explicit form of the optimal 
linear sensor equation, the Kalman filter, and the certainty 
equivalence controller that jointly solves the problem can be 
efficiently fonnd by semidefinite programming (SDP). Whether 
the linearity assumption in our design is restrictive or not is 
currently an open problem. 

Notation 

Lower-case bold characters such as x are used to represent 
random variables. By x ~ E), we mean that x is 

a multi-dimensional Gaussian random variable with mean 
vector p, and covariance matrix E. If Xi, X 2 , • • • is a sequence 
of random variables, we write x* = (xi, • • • ,Xt). Let S"_|_ 
(resp. S") be the space of n-dimensional real-valued sym¬ 
metric positive definite (resp. positive semidefinite) matrices. 
A condition M e S"_|_ (resp. M G S") is also written as 
M Q (resp. M A 0). For a real-valued vector x G K" and a 
positive semidefinite matrix Q A 0 , we write ||x||q = x^Qx. 

1. Introduction 

The classical LQG control theory is not concerned with 
the information-theoretic cost of communication between 
the sensor and controller devices. However, communication 
could be a costly process in practice due to various reasons. 
Motivated by such situations, in this paper, we consider 
a joint sensor and controller design problem, aiming at 
minimizing the communication between these devices. 

In Fig. [2 the dynamical system block represents a linear 
stochastic system 

xt+i = AtXt + BtUt+wt, t = 1,-■ ■ ,T (1) 

where Xi ~ A/’(0, Pi|o) and Wj ^ A/’(0, Wt), t = 1, • • • ,T 
are independent random vectors. We assume >- 0 and 
Wt P Q for every f = 1, • • • , T. Suppose x* G M”*, Uj G 
K™*, and dimensions can be time varying. The sensor block 
is a data processing unit that has an access to the entire 
history of the state variables x‘, the history of control inputs 
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Fig. 1. Information-regularized LQG control problem 

and signals it has generated in the past and 

generates a signal yt G K’’* at time step t. We denote 
by TTg the space of sensor’s policies, whose mathematical 
description will be specified shortly. The controller block 
is another data processing unit that has an access to y* 
and and generates a control input Uj at time step 

t. We denote by tTc the space of the controller’s policies. 
We are interested in jointly designing the sensor’s and the 
controller’s decision-making policies to solve the following 
optimization problem; 

min dcont “t“ dinfo ( 2 ) 

TTg X TTc 

dcont = X 2 ^ (ll^*+lllQt + llwll/ij 
t=l ^ 

T 

dinfo = ^ 7 t/(xt;yt|y‘"\u‘"^). 

t=i 

We assume that Qt h 0 and Rt P 0 for every f = 1, • • • ,T. 
The term /(xj;y^|y*“^,denotes the conditional mu¬ 
tual information [ 1 ], and 74 is a positive scalar for every 
We call (j^ the information-regularized LQG 
control problem. 

In the standard LQG control theory, the sensing policy is 
typically assumed to be 

yt = C'tXt -b Vt, Vt - A/'(0, Vt) (3) 

where is a white Gaussian stochastic process and the 
matrices are given. Due to the well-known 

separation principle, the optimal controller policy for the 

^Suppose 7t = IVf = 1, • • • , T. Under the sensor-control architecture 
we propose in Section |iv| it can be shown that J^nfo = /(x^ 
where the right hand side is known as the directed information [2]. 
















standard LQG control problem min^^ Jcont can be found by 
solving forward and backward Riccati recursions. In 
in contrast, we do not assume and allow sensors to 
be any causal data collecting mechanism in TTg. However, 
min^^ X TTc >/cont is an uninteresting problem, since trivially 
Yt = xt (perfect observation) together with the linear- 
quadratic regulator (LQR) is optimal. To exclude this trivial 
solution, we aim at minimizing J^om + •/info as in Q, which 
amounts to charging the cost 7 ^ for every bit of innovative 
information collected by the sensor at time step t. Notice 
that full observation yt = xj results in Jinfo = + 00 . 

Although we are not aware of a complete solution to 0 , 
we here provide an SDP-based algorithm to construct an 
optimal linear sensor-controller joint policy. This result turns 
out to be an extension of an SDP-based algorithm for the 
sequential rate-distortion problem proposed in [3]. 

II. Applications and Related Work 

In this section, we briefly summarize connections between 
the information-regularized LQG control problem (j^ and 
related work in the control, information theory, robotics, 
social science, and economics literature. 

A. Control over a communication channel 

In Fig. [2 suppose that agents A and B are geographically 
separated, and the communication channel from A to B is 
band-limited. Suppose that agents A and B are in collabora¬ 
tion to design the sensor and controller blocks. What kind of 
data should then a sensor collect and transmit, so that B can 
generate a satisfactory control signal in a real-time manner? 

Feedback control over noisy channels has been a popular 
research topic in the past two decades. Most of the early 
contributions focus on stabilization of unstable dynamical 
systems using feeback control over band-limited communi¬ 
cation channels. A very partial list of papers in this context 
is [4]-[10]. This research direction naturally leads to trade¬ 
off studies between the achievable control performance and 
the required capacity of the sensor-controller communication 
channel. If the communication rate is finite, larger block 
length (achieving high resolution) is not necessarily preferred 
since the resulting delay leads to the loss of control perfor¬ 
mance [11]. LQG control performance subject to capacity 
constraints is considered in [ 12 ], where a certain “separation 
principle” between control design and communication design 
is reported. The authors of [13] consider a fundamental per¬ 
formance limitation of the finite horizon minimum-variance 
control (MVC) over noisy communication channels in the 
LQG regime. More comprehensive literature surveys on 
control designs over communication channels are available 
in [14H17]. 

However, the majority of the existing work in this con¬ 
text assume sensor models and/or channel models a priori, 
and are different from 0 . A few exceptions include [18] 
and [19], where sensor-controller joint design problems are 
considered. However, these works are concerned with sensor 
power constraints rather than information constraints, and are 
different from Our problem formulation Q falls into 


a general class of sensor optimization problems considered 
in [ 20 ], where several results are derived regarding the 
convexity of the problem and the existence of an optimal 
solution under different choices of topologies in the space of 
sensors. However, no structural results on specific problems 
appear there. 

B. Bounded rationality 

Broadly, the term bounded rationality is used to refer 
to the limited ability of decision makers (human or robot) 
to acquire and process information. The rational inatten¬ 
tion model introduced by [ 21 ] in the economics literature 
characterizes bounded rationality using the idea of Shan¬ 
non’s channel capacity. Inspired by this model, recently 
[22] considered an information-constrained LQG control 
problem, which is similar to ( 0 . In this paper, we remove 
the somewhat restrictive assumptions made in [ 22 ], including 
that a controller there is a time invariant function of the 
cuiTent state only. Furthermore, our SDP-based approach is 
powerful in handling multi-dimensional systems, while [ 22 ] 
is cuiTently restricted to scalar systems. 

C. Privacy-preserving control 

In Fig. [T] suppose that agent A can privately observe 
its internal state X(, and that Xt must be controlled by an 
external agent B through control input u^. At every time 
step, a message yt containing information about the current 
state Xt is created by the agent A and is sent to the agent 
B, so that B can compute desirable control inputs. However, 
sending yt — xt may not be desirable for a privacy-aware 
agent A, since this means a complete loss of privacy. What 
is then the optimal message yt? 

Suppose that the loss of privacy caused by disclosing 
yt at time step t is quantified by the conditional mutual 
information /(xt; yt|y*“^, Conditioning on y*“^ and 

reflects the fact that agent B knows a realization 
of these random variables by the time he receives a new 
message yt. (Similar quantities are used to evaluate privacy 
in wiretap channel problems [23], as well as in more recent 
database literature [24] [25].) Introducing the “price of pri¬ 
vacy” 7 t, the optimal privacy-preserving control problem can 
be formulated as Q. In contrast, [26] employs differential 
privacy as a privacy measure in dynamic state estimation 
problems. 

HI. Problem Formulation 
A. Information-regularized LQG control problem 

In this paper, both the sensor’s and controller’s policies are 
modeled by Borel-measurable stochastic kernels. Set X = 
M" and y = K™ and let B and By be the Borel cr-algebras 
on X and y respectively, with respect to the usual topology. 

Definition 1: A Borel-measurable stochastic kernel from 
{X, Bx) to {y,By) is a map q(-|-): ByxX such that 

• q(-jcc) is a probability measure on By) for every xGX. 

• q{E\-) is -measurable for every E G By. 


A Borel-measurable stochastic kernel from {X,Bx) to 
(3^, By) will be simply referred to as a stochastic kernel from 
X to y, and denoted by q{dy\x). The space of stochastic 
kernels from A' to 3^ is denoted by Qy|x- 

The sensor’s policy at time < is a stochastic kernel from 
X* X X to yt- The controller’s policy at time t is 
a stochastic kernel from y* x to Ut- Using the notation 
above, the policy spaces tt^ and tTc are formally defined by 

T T 

TTg and TT^ Qut|y*,u*“i- 

t=l t=l 

Then, is an optimization problem over the sequences 
of stochastic kernels {q{dyt\x^,y^~^ S tTs and 
{q{dut\y *^G TTc- Once an element in tTs x tTc is 
picked, then a joint probability measure p{dx^, dy^, du^) 
over X"^ X 3^^ x is uniquely determined (see Proposi¬ 
tion 7.28 in [27]). 


B. Restricted problem 

To the best of the authors’ knowledge, little is known 
about the structure of the optimal solution to 0- Namely, 
it is currently unknown whether there exists jointly linear 
policy in tTs x tTc that attains optimality in (UlH Hence, in 
this paper, we focus on a restricted problem in which sensor’s 
policy is restricted to the form ([^. That is, we consider 

min (/cont “b '^info ('^) 

7 r>f X TTc 

where ttJ,™ is the space of sequences of stochastic kernels 
{q{dyt\xt)}f=i, which can be realized by a linear sensor 
equation 0 with some Ct and Vj to be determined. We 
tackle this problem by applying an SDP-based solution to 
the sequential rate-distortion (SRD) problem obtained in [3]. 
Based on the existence of a linear optimal solution to the 
Gaussian SRD problem (as shown in [28]), we will show 
that 0 has a jointly linear optimal solution. 

IV. Summary of the Result 


In this section, we provide a complete solution to the 
restricted information-regularized LQG control problem Q. 
Specifically, we claim that the following numerical procedure 
allows us to explicitly construct the optimal stochastic ker¬ 
nels {q{dyt\xt)}Jii e and {q{dut\y\u*-^)}f^^ G 
for 0. 

Step 1. (Controller design) Compute a backward Riccati 
recursion. 

Qt if t = T 

Qt + -Vt_|_i ift = l,-- - ,T—1 
Mt = Bj StBt + Rt 
Nt = Aj{St - StBtM^-^BjSt)At 
Kt = -M^-^BjStAt 
Qt = KjMtKt 


(5a) 

(5b) 

(5c) 

(5d) 

(5e) 



The matrix St is commonly understood in the LQR theory 
as the “cost-to-go” function, while Kt is the optimal control 
gain. The auxiliary parameter ©t will be used in Step 2. 

Step 2. (Covariance scheduling) Solve a max-det problem 
with respect to {Pt\t)At]J^i subject to the LMI constraints: 


t=i 


^(-Tr( 0 iPq*)-|logdetn 


C 


(6a) 


s.t. 


Ht ^ 0, t = I,-- - ,T (6b) 

Pt+i\t+i A AtPt\tAj + Wt, t = 1 (6c) 

^’i|i ^ Pi\o, Pt\t = Ut (6d) 


Pt\t — At Pt\tAj 

AtPt\t VVt-\-AtPt\tAj 


^0, t = (6e) 


where C is a constant Due to the boundedness of the 
feasible set, 0 has an optimal solutiorj^ 

Step 3. (Sensor design) Set rt = rank(Pj1"j^ — for 

every t = 1, • • • , T, where 


Pt\t-i — At-iPt-i\t-iAj_i + Wt-iA — 2, • • • , r. 


Choose matrices Ct G and V) G so that they 

satisfy 

CjVC^Ct = P;^^-P;^l, (7) 

for t = 1, • • • ,T. For instance, the singular value decompo¬ 
sition can be used. In particular, in case of r* = 0, Ct and 
Vt are considered to be null (zero dimensional) matrices. 
Step 4. (Filter design) Determine the Kalman gains by 

Lt = Pt\t-iCj{CtPt\t-iCj + Ut) ( 8 ) 


If rt = 0, Lt is a null matrix. 

Step 5. (Policy construction) Using {Ct,Vt, Lt, Kt}f^i 
obtained above, define the sensor’s policy {q{dyt\xt)}JLi G 
TTg" by equation 0. When rt = 0, the optimal dimension 
of the sensing vector yt is zero, meaning that no sensing is 
the optimal strategy. On the other hand, define a controller’s 
policy {q(dutly^,u*~^)}^i G tTc by the certainty equiva¬ 
lence controller Uj = Kt^t where Xt = E(xt|y‘, is 

obtained by the standard Kalman filter 


When rt 


Xt =xt|t_i-bTt(yt-C'tXt|t_i) (9a) 

xt+i|t = AtXt + BtVLt. (9b) 


0, (9a I is simply replaced by Xt 


Xt|f-1. 


The constant is given by 
T-l 


1 —1 / 


^ = E 


V 2 


log 'A+1 _|_ y log det VUt')-l- ^ log det Pj |o 

7t 2 / 2 


i 

-FiTr{JViPi|o) + i^Tr(iytSt). 


^Our problem is different from the optimal LQG control over Gaussian 
channels, where a linear encoder-controller pair is optimal (e.g., [17] Ch. 
11 ). 


^One can replace jbbj with lit E el without altering the result. This 
conversion makes the feasible set compact and thus the Weierstrass theorem 
can be used. 





Theorem 1: There exists a joint sensor-controller policy 
in TTg" X TTc that attains optimality in (|^. The optimal 
value of Q coincides with the optimal value of the max- 
det problem Furthermore, an optimal policy can be 
constructed by the Steps 1-5. 


V. Derivation of the Main Result 

We first show that, once the sensor’s policy {qy^\xt}J=i G 
tt),'" is fixed, then Ji„fo does not depend on the choice of 
controller’s policy. This observation allows us to rewrite Q 


min Jinfo -I- min 


( 10 ) 


Then, we interpret ( [T0| ) as a two-player Stackelberg game 
(see, e.g., [29]) in which the sensor agent (agent A in Fig.[T]) 
is the leader and the controller agent (agent B) is the follower. 
If the sensor’s policy is given, the controller’s best response 
can be explicitly found by solving a stochastic optimal 
control problem min^^ Jcont- With an explicit expression of 
min^^ Jcont, we show that the outer optimization problem in 
(10 1 over Trj,™ becomes the sequential rate-distortion problem 


[28], whose optimal solution can be constructed by solving 
an SDP problem [3]. 

Fix a joint sensor-controller policy in TTg" x tTc and let 
p{dx"’", , du"^) be the resulting joint probability mea¬ 

sure. Let p{dxt\y ^~^and p{dxt\y *^be proba¬ 
bility measures obtained by conditioning and marginalizing 
p{dx"’'^dy"’'^dv?'). It follows from the standard Kalman 
filtering theory that 


p{dxt\y* -A/'(xt|t_i,Pt|t_i) 

p{dxt\y\u*-~'^) - A/'(xt,Pt|t) 

where and satisfy 

+ Wt-i (11a) 

Pt\t = {p;\U + cJv,-^Ct)-\ ( 11 b) 

while Xj and Xt|t_i are recursively obtained by (|^. Using 
matrices {Pt\t} and {Pj|t_i}, the mutual information terms 
can be explicitly written as 


\u* i) = /i(xt|y‘ \u* - h{xt\y* 

= 2 logdetPt|t_i--logdetPt|t. 

Therefore, 

T 

^info = ^ 7 t/(xt;yt|y*"\u*"^) 

T-1 

= (^^logdetPt+i |4 - ylogdetPjit^ 

+ y log det Pi|o - y log det Pt\t- 

In particular, this result clearly shows that the mutual 
information terms are control-independent, since they are 
completely determined by ([TT]i once the sequence of matrices 


{Ct, VtltLi is fixed. This observation justifies the equiva¬ 
lence between Q and ( [TOl i. 

Next, let us focus on the stochastic optimal control prob¬ 
lem minjr^ Tcont, whose solution is well understood. 

Lemma 1: For every fixed {qy^\xt}t=i € 7r|."’, the cer¬ 
tainty equivalence controller Uj = Ktit where Xt = 
E(xt|y‘, is an optimizer of min^^ Jcont- Moreover, 

T 

min Jeo„t = ^Tr(iViPiio) + V(Tr(VFfc^fc) +Tr(0fcPfe|fc)) • 

'^c Zi z 

Proof: This is a standard result and can be shown by 
dynamic programming. A proof is provided in Appendix. ■ 

Combining the results so far, we have shown that 
-^info min Jcont 

TTc 

^—1 \ 

(2 Tr(0t ft 11 ) + log det Pt +111 - ^ log det P* | J 

-f-Tr( 0 j’P 7 ’|r) — ^ logdet Pj’ij’-f c (12) 

where c= iTr(iViPi|o) + ^ log det Ppo + | 
is a constant. The expression ( [T^ is the cost of the original 
problem (j^ when the sensor model is fixed 

and the controller agent (Stackelberg follower) reacts with 
the best response. Notice that © is a function of the 
sequence {Ct, Vt}J^i, since the matrices Pqj and Pt+i\t are 
determined by ( [TT] i. 

Now we have formulated a problem for the sensor agent 
(Stackelberg leader). Namely, the sensor agent needs to find 
the optimal sequence of matrices {Ct,Vt}f-i (as well as 
their dimensions) that minimizes ( fT^ . Next, we show that 
this can be done very efficiently by solving a semidefinite 
programming problem. 

Let us first focus on the quantity 

^^logdetPj+iit - y logdetPtij. (13) 
Introducing At = ff^At and Wt = 


— X ([T^ = logdet{AtPt\tAJ -f Wt) - logdefPtit 


7t 


= log det Wt - log det(Pj|/ -I- AjWt ^At) ^ 

7t+i 


= min log det Wt — log det lit -f nj log 

lT Tir-1 


It 


(14a) 

(14b) 


s.t. 


0 ^ Ut A (P-/ + At' Wt-^At) 


-1 


= min log det Wt — log det lit -I- nt log 


7t-ri 


s.t. 




ft|t~n 


It 

P 0 , lit P 0 


(14c) 


We have used Sylvester’s determinant theorem in step (14a i. 
The quantity ( |14a| i is equal to the optimal value of a con¬ 
strained optimization problem ( |14b| i with decision variables 
Pt|t and lit, and this rewriting is possible because of the 
monotonicity of the determinant function. In (14c i, the 
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Fig. 2. Orbital coordinate and desired attitude of a nadir-pointing satellite. 


constraint lit is rewritten using 

the Schur complement formula. The final expression (14c i 
is particularly useful, since this is a max-det problem subject 
to linear matrix inequality (LMl) constraints. 

Applying the discussion above to every f = 1, • • • , T — 1, 
and introducing Hr = Pt\t for notational convenience, it 
follows from ( [T^ that the optimal J is equal to the value 
of the following optimization problem with respect to the 
decision variables 


T 


^Tr{etPt\t) - Y logdetn* 


c 


s.t. 


A|t~nt 




AtPtit AtPtjtAf +Wt 


i|t^ 

■T 




Ut y 0, t — ‘ 

rix" — PrjT 


-1 


t = 2. 


Ct 


,T. 


The last two constraints are obtained by eliminating Pt\t-i 
from (111. These equality constraints themselves are difficult 
to handle, but can be replaced by the inequality constraints 


0 ^ Pill A Pi|o 

0 ^ Pt\t A At-iPt-i\t-iA^-i + Wt-i- 

These replacements eliminate the variables {Ct,Vt}f^i, and 
convert the above optimization problem into an alternative 
moblem with respect to {Pt\t,At}f^i only, as shown in 
(^. The eliminated variables {Ct,Vt}f^i can be easily 
reconstructed by 0- 

Solving ([^ allows us to optimally schedule the sequence 
of covariance matrices. The optimal covariance sequence can 
be attained by the Kalman filter (|^. 


VI. Example 


In this section, we design an attitude control law for a 
nadir-pointing spacecraft (Fig. using magnetic torquers. 
Small deviations of the body coordinates from the orbital 
coordinates are measured by angles (j), 6 and ij}, and their 
dynamics is modeled by a linearized equation of motion 
borrowed from [30]. 



Fig. 3. Simulation (Sampling period = 2 min) 


Here, ujq is the orbital rate {2tt [rad] / 90 [min]), 
diag{I^, ly, Iz) is the moment of inertia of the spacecraft, 
and ax = , ay = The Earth’s 

magnetic field vector {bx{t)^by{t),bz{t)) in the orbital co¬ 
ordinate is time varying as the position of the spacecraft 
changes (the simplified model of the magnetic field shown 
in Fig. [^(a) will be used). {ux,Uy,Uz) is the control output 
of the magnetic torquers. Assume that the spacecraft has an 
attitude sensor that measures 0, ■!/;, wg, accurately, 
but communication between the attitude sensor and the 
magnetic torquers incurs cost. 

The information-regularized LQG control problem (|^ is 
formulated over the planning horizon of 140 minutes after 
converting (15i into a discrete-time model with sampling 
period of 2 minutes, and all necessary parameters are 
appropriately chosen. Fig. shows a result of a sensor- 
controller joint strategy obtained by the Steps 1-5. Fig. 
(b) shows the optimal assignment of the information rate 
/(xt; f = 1, • • • , 70. It can be seen that ac¬ 

quiring a lot of information at the beginning is advantageous 
in this example. Simulated control actions and deviation from 
the desired attitude are shown in subfigures (c) and (d). The 
dotted lines in (d) show the estimated deviations calculated 
by the Kalman filter (j^. The resulting costs are Jcont = 1-472 
and Jinfo = 0.245. For comparison purposes, (e) shows the 
case where the optimal linear quadratic regulator (FQR) is 
applied with the perfect measurement yt = Kt with the same 
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Fig. 4. Relationship between sequential rate-distortion problem and LQG 
optimal control problem. 

noise realizations. In this case, we have J^om = 1-352 and 
■/info = + 00 . It can be seen that the control performance 
in (d) is not so much worse than (e), even though the 
information rate required for (d) is drastically smaller. 

VII. Discussion and Future Works 

In this paper, we presented an SDP-based optimal 
joint sensor-controller synthesis for (restricted) information- 
regularized LQG control problems. Unfortunately, to the best 
of the author’s knowledge, it is not known whether the same 
architecture remains optimal in the fully general information- 
regularized LQG control problem Q. The technical difficulty 
here is that once nonlinear sensor policies tt^ are allowed, 
the mutual information term Jinfo is no longer control- 
independent in general, and the discussion in Section [V| does 
not hold. 

Finally, the information-regularized LQG control problem 
considered in this paper can be viewed as a preliminary 
step towards a unification of the classical LQG control 
problem and the Gaussian sequential rate-distortion problem 
(Fig. 1^. In the classical LQG control problem where a 
sensor model is fixed, the estimator-controller separation 
principle is well-known. On the other hand, if a feedback 
control is not considered (or controller is fixed), and Jcont is 
replaced by ~ th® problem becomes 

the Gaussian sequential rate-distortion problem [28], and the 
sensor-estimator separation principle also holds [3]. 
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APPENDIX 


We consider Jcont as a T-stage dynamic program¬ 

ming problem. The state of the system at stage f is a joint 
probability measure p{dx*, dy*^, du*~^) which is updated by 

=q{dyt+i\xt+i)fidxt+i\xt,Ut) 

X q{dut\y^, u^~^)p{dx^,dy^, du*~^). 


Here, the stochastic kernel f{dxt+i\xt,Ut) is given by 
Q, while q{dyt+i\xt+i) is the sensing policy, which is 
assumed to be fixed. The stochastic kernel q{dut\y*, u*~^) S 
Qut|y*.u*-i is the control variable in this dynamic program¬ 
ming formulation. The associated Bellman’s equation is 


Jt{p{dx,dy\du* ^)) = 

min l^E{\\xt+i\\Qy\\M%)+Jt+iip{dx*^ldy*^+ldu^)) 


{Q'x'- ,y^ ) 

= o min |jE(|lx,+i||^^-f||u,||^J-f iE||x,+i||^ 


+1 Y. iMWkSk)+TT{ekPk\k)) 

= min |iE(|ixi+i|||^-f||uz|||J 

+ l Y (Tr(iVfc^fc)+Tr(0fePfe|fe)) 




= nY {MWkSk)+Tr{ekPk\k)) 


k=l-\-l 


min ^E {\\AiXi + BiUi + Wi||| -f \\ui\\%) 


= IY {MWkSk)+Tr{ekPk\k)) 

k—l-\-l 

+ iE|lx,||^, + i {TviWiSi)+TvieiPm)) 

T 

= ^E||x,|||,^ -f (TrjWfcS'fc) -f Tr(0fcPfc|fe)) ■ 


Noticing E||xi||^^ = Tt{NiPi\q), Lemma H follows from 
Claim [T] 


with the boundary condition Jt+i{-) = 0. 

Claim 1: For every t = 1, - • • ,T, the certainty equiva¬ 
lence controller Uj = Ktkt where x = E(xt|y‘, is the 

optimal control policy in Qut|y* u‘-i- Moreover, for every 


(^x* ,y* ) 

T 

(Tr(fVfe5fe)+Tr(0fcPfc|fe)) . (16) 

k—t _ 

Proof: Equation holds when t = T as 


JT{q:^T yT ,,T-l) 

= min ;^E(|jAtXT-fPtUT+WT||Q^ + ||uT|j|^) 

SurlyTuT’-i Z 

= min ;^E(||xr||^^-f||wT||Q^+||uT-P:TXT||IfT) 

= -EIIxtIIIt^ + - {Tx{WtQt) + Tr(0rP7'|7')) . 


Notice that in the second expression, ut appears only 
in \\ut - KTy.TW'ii^- choosing ut = P:TE(xT|y^), 
this quantity attains its minimum value E||Pr 7 ’(x — 
E(xT|y^))||M.j, = Tr(0TPT|T)- So assume (161 holds for 



